Classification Based On Grain Size

Introduction to Grain Size Classification 📚

This classification is based on recognizing the parallelism in a program to be executed on a multiprocessor system. The idea is to identify the sub-tasks or instructions in a program that can be executed in parallel.

🔄Parallel Execution Example

For example, there are 3 statements in a program and statements S1 and S2 can be exchanged. That means, these are not sequential as shown in the figure. Then S1 and S2 can be executed in parallel.

Parallel Execution for S1 and S2

Sequential Execution:
S1: a = b + c
S2: d = e + f
S3: g = a + d

Parallel Execution:
S1: a = b + c    S2: d = e + f
      ↓                  ↓
S3: g = a + d

⚖️Factors Affecting Parallelism

But it is not sufficient to check for the parallelism between statements or processes in a program. The decision of parallelism also depends on the following factors:

🖥️

Architectural Features

Number and types of processors available

💾

Memory Organization

How memory is structured and accessed

🔗

Dependencies

Data, control, and resource dependencies

Parallelism Conditions 🔗

As mentioned earlier, parallel computing necessitates that the segments to be run concurrently must be autonomous of one another. Therefore, before implementing parallelism, all the prerequisites of parallelism between the segments need to be examined.

📊Dependency Relations

In this part, we talk about three kinds of dependency circumstances between the segments:

💾

Data Dependency

When two or more commands use the same information

🎛️

Control Dependency

Dependencies within control structures

🔧

Resource Dependency

When instructions utilize the same shared resource

💾Data Dependency

Data dependency refers to the condition where two or more commands use the same information. The following kinds of data dependencies are identified:

➡️

Flow Dependence

If instruction I2 follows I1 and output of I1 becomes input of I2, then I2 is said to be flow dependent on I1

⬅️

Anti-dependence

When instruction I2 follows I1 such that output of I2 overlaps with the input of I1 on the same data

🔄

Output Dependence

When output of the two instructions I1 and I2 overlap on the same data

📁

I/O Dependence

When read and write operations by two instructions are invoked on the same file

💻Data Dependency Example

I1: a = b
I2: c = a + d
I3: a = c

This program segment contains instructions I1, I2, and I3 that have various dependencies:

I1 and I2 are flow dependent because I1 generates variable a as output, which is then used by I2 as input
I2 and I3 are anti-dependent since I3 generates variable a but I2 uses it, and I2 comes before I3 in the sequence
I3 is flow dependent on I2 due to variable c
I3 and I1 are output dependent because both instructions generate variable a

🎛️Control Dependence

Instructions or segments in a program often contain control structures. As a result, dependency among the statements can also occur within control structures. However, the order in which instructions in control structures will execute is not known until run time.

For (i = 1; i <= n; i++) {
    if (x[i - 1] == 0)
        x[i] = 0
    else
        x[i] = 1;
}

In the above control structure, the successive iterations are dependent on one another because the value of x[i] in iteration i depends on the value of x[i-1] from the previous iteration.

🔧Resource Dependence

The similarity between the instructions can also be influenced because of the shared resources. If two instructions are utilizing the same shared resource then there is a resource dependency situation.

🧮

ALU Dependency

When floating point units or registers are shared

💾

Storage Dependency

When memory is being shared

Bernstein Conditions for Detection of Parallelism 📐

For execution of instructions or block of instructions in parallel, it should be ensured that the instructions are independent of each other. These instructions can be data dependent / control dependent / resource dependent on each other. Here we consider only data dependency among the statements for taking decisions of parallel execution.

📝Bernstein's Sets

A.J. Bernstein has elaborated the work of data dependency and derived some conditions based on which we can decide the parallelism of instructions or processes. Bernstein conditions are based on the following two sets of variables:

📖

Read Set (R_I)

The input set that consists of memory locations read by the statement of instruction I₁

✍️

Write Set (W_I)

The output set that consists of memory locations written into by instruction I₁

The sets R_I and W_I are not disjoint as the same locations are used for reading and writing by S_I.

📐Bernstein Parallelism Conditions

The following are Bernstein Parallelism conditions which are used to determine whether statements are parallel or not:

1️⃣

First Condition

Locations in R₁ from which S₁ reads and the locations W₂ onto which S₂ writes must be mutually exclusive: R₁ ∩ W₂ = φ

2️⃣

Second Condition

Locations in R₂ from which S₂ reads and the locations W₁ onto which S₁ writes must be mutually exclusive: R₂ ∩ W₁ = φ

3️⃣

Third Condition

The memory locations W₁ and W₂ onto which S₁ and S₂ write, should not overlap: W₁ ∩ W₂ = φ

💻Bernstein Conditions Example

To show the operation of Bernstein's conditions, consider the following instructions of a sequential program:

I1: x = (a + b) / (a * b)
I2: y = (b + c) * d
I3: z = x² + (a * e)

Now, the read set and write set of I1, I2 and I3 are as follows:

📖

I1 Sets

R₁ = {a, b}
W₁ = {x}

📖

I2 Sets

R₂ = {b, c, d}
W₂ = {y}

📖

I3 Sets

R₃ = {x, a, e}
W₃ = {z}

🔍Checking Parallelism

Now let us find out whether I1 and I2 are parallel or not:

R₁ ∩ W₂ = {a, b} ∩ {y} = φ ✓
R₂ ∩ W₁ = {b, c, d} ∩ {x} = φ ✓
W₁ ∩ W₂ = {x} ∩ {y} = φ ✓

That means I1 and I2 are independent of each other and can be executed in parallel.

Similarly for I1 || I3:

R₁ ∩ W₃ = {a, b} ∩ {z} = φ ✓
R₃ ∩ W₁ = {x, a, e} ∩ {x} = {x} ≠ φ ✗
W₁ ∩ W₃ = {x} ∩ {z} = φ ✓

Hence I1 and I3 are not independent of each other and cannot be executed in parallel.

For I2 || I3:

R₂ ∩ W₃ = {b, c, d} ∩ {z} = φ ✓
R₃ ∩ W₂ = {x, a, e} ∩ {y} = φ ✓
W₃ ∩ W₂ = {z} ∩ {y} = φ ✓

Hence, I2 and I3 are independent of each other and can be executed in parallel.

Parallelism Results

✅

I1 || I2

Can run in parallel

❌

I1 || I3

Cannot run in parallel

✅

I2 || I3

Can run in parallel

Parallelism Based On Grain Size 🌾

📏Definition of Grain Size

Grain size or Granularity is a measure which determines how much computation is involved in a process. Grain size is determined by counting the number of instructions in a program segment.

📊Types of Grain Sizes

🔬

Fine Grain

Contains approximately less than 20 instructions

⚖️

Medium Grain

Contains approximately less than 500 instructions

🏔️

Coarse Grain

Contains approximately greater than or equal to one thousand instructions

Types of Grain Sizes

Fine Grain Example:
a = b + c
d = e + f
g = a + d

Medium Grain Example:
function calculateSum(arr) {
    let sum = 0;
    for (let i = 0; i < arr.length; i++) {
        sum += arr[i];
    }
    return sum;
}

Coarse Grain Example:
function processImage(image) {
    // Image processing algorithm
    // Hundreds or thousands of instructions
    return processedImage;
}

🔄Parallelism Levels

According to the grain sizes, parallelism in a program can be categorized into different levels. These parallelism levels make up a hierarchy where processes become finer-grained at lower levels. As the level increases, the degree of parallelism decreases.

📝

Instruction Level

The most basic level with the highest degree of parallelism. Fine grain size with just a few instructions per grain.

🔄

Loop Level

Involves parallelizing iterative loop instructions. Also fine grain size. Simple loops are easy to parallelize.

🧩

Procedure Level

Consists of procedures, subroutines or subprograms. Medium grain size with thousands of instructions per procedure.

📚

Program Level

The highest level, consisting of independent programs. Coarse grain size with tens of thousands of instructions per program.

📊Parallelism Levels Characteristics

Parallelism Level	Grain Size	Characteristics	Implementation
Instruction Level	Fine	Highest degree of parallelism, more overhead	Hardware (superscalar, VLIW)
Loop Level	Fine	Simple loops easy to parallelize, recursive loops difficult	Compilers can achieve automatically
Procedure Level	Medium	Programmers have exploited, compilers not yet
Program Level	Coarse	Tens of thousands of instructions, independent programs	Time sharing, operating system

Parallelism Levels and System Types 🏗️

📊Relation Between Grain Sizes and Parallelism

Grain Size	Parallelism Level	Typical System Type
Fine Grain	Instruction or Loop Level	SIMD organization
Medium Grain	Procedure or Subprogram Level	Loosely coupled systems
Coarse Grain	Program Level	Tightly coupled or shared memory multiprocessors (e.g., Cray Y-MP)

🖥️System Types for Different Grain Sizes

🔗

Tightly Coupled Systems

Typically used for coarse grain parallelism. Examples include shared memory multiprocessors like the Cray Y-MP.

📦

Loosely Coupled Systems

Utilized to execute medium grain program segments. Distributed memory systems fall into this category.

🔀

SIMD Organization

Fine grain parallelism has been seen in the SIMD (Single Instruction, Multiple Data) organization of computers.

💡Practical Considerations

⚡

Communication Overhead

Fine grain parallelism requires more communication and synchronization overhead

📈

Scalability

Coarse grain parallelism is generally more scalable to larger systems

👨‍💻

Programming Complexity

Fine grain parallelism is often more complex to program effectively

🔄Parallelism Hierarchy

Parallelism Levels Hierarchy

Program Level (Coarse Grain) - Highest level, lowest parallelism degree 📚

Procedure Level (Medium Grain) - Medium parallelism degree 🧩

Loop Level (Fine Grain) - High parallelism degree 🔄

Instruction Level (Fine Grain) - Highest parallelism degree, lowest level 📝

Summary 📝

📚Classification Systems Covered

🔄

Flynn's Classification

Based on instruction and data streams (SISD, SIMD, MISD, MIMD)

🏗️

Handler's Classification

Categorizes computers at three distinct tiers: PCU, ALU, and BLC

🔗

Structural Classification

Tightly coupled (shared memory) and loosely coupled (distributed memory) systems

🌾

Grain Size Classification

Based on the amount of computation in a process (fine, medium, coarse)

🔍Key Concepts in Grain Size Classification

🔗

Dependencies

Data, control, and resource dependencies affect parallelism

📐

Bernstein Conditions

Mathematical conditions to determine if instructions can run in parallel

📏

Grain Sizes

Fine (<20 instructions), Medium (<500 instructions), Coarse (≥1000 instructions)

🏗️

Parallelism Levels

Instruction, Loop, Procedure, and Program levels with decreasing parallelism degree

💡Final Thoughts

Understanding grain size classification is essential for designing efficient parallel computing systems. The choice of grain size affects the system's performance, scalability, and programming complexity. Fine-grained parallelism offers high parallelism but with more overhead, while coarse-grained parallelism is easier to implement but may not utilize all available resources optimally.

By analyzing dependencies using Bernstein conditions and selecting the appropriate grain size for the target architecture, we can design parallel systems that effectively balance parallelism with overhead, leading to optimal performance for a wide range of applications.

Classification Based On Grain Size

Introduction to Grain Size Classification 📚

🔄Parallel Execution Example

⚖️Factors Affecting Parallelism

Architectural Features

Memory Organization

Dependencies

Parallelism Conditions 🔗

📊Dependency Relations

Data Dependency

Control Dependency

Resource Dependency

💾Data Dependency

Flow Dependence

Anti-dependence

Output Dependence

I/O Dependence

💻Data Dependency Example

🎛️Control Dependence

🔧Resource Dependence

ALU Dependency

Storage Dependency

Bernstein Conditions for Detection of Parallelism 📐

📝Bernstein's Sets

Read Set (RI)

Write Set (WI)

📐Bernstein Parallelism Conditions

First Condition

Second Condition

Third Condition

💻Bernstein Conditions Example

I1 Sets

I2 Sets

I3 Sets

🔍Checking Parallelism

Parallelism Based On Grain Size 🌾

📏Definition of Grain Size

📊Types of Grain Sizes

🔄Parallelism Levels

Instruction Level

Loop Level

Procedure Level

Program Level

📊Parallelism Levels Characteristics

Parallelism Levels and System Types 🏗️

📊Relation Between Grain Sizes and Parallelism

🖥️System Types for Different Grain Sizes

Tightly Coupled Systems

Loosely Coupled Systems

SIMD Organization

💡Practical Considerations

Communication Overhead

Scalability

Programming Complexity

🔄Parallelism Hierarchy

Summary 📝

📚Classification Systems Covered

Flynn's Classification

Handler's Classification

Structural Classification

Grain Size Classification

🔍Key Concepts in Grain Size Classification

Dependencies

Bernstein Conditions

Grain Sizes

Parallelism Levels

💡Final Thoughts

Read Set (R_I)

Write Set (W_I)